The Role of Algorithm Bias vs Information Source in Learning Algorithms for Morphosyntactic Disambiguation

نویسندگان

  • Guy De Pauw
  • Walter Daelemans
چکیده

Morphosyntactic Disambiguation (Part of Speech tagging) is a useful benchmark problem for system comparison because it is typical for a large class of Natural Language Processing (NLP) problems that can be defined as disambiguation in local context. This paper adds to the literature on the systematic and objective evaluation of different methods to automatically learn this type of disambiguation problem. We systematically compare two inductive learning approaches to tagging: MXP O S T (based on maximum entropy modeling) and MBT (based on memory-based learning). We investigate the effect of different sources of information on accuracy when comparing the two approaches under the same conditions. Results indicate that earlier observed differences in accuracy can be at tr ibuted largely to differences in information sources used, rather than to algorithm bias. 1 C o m p a r i n g T a g g e r s Morphosyntactic Disambiguation (Part of Speech tagging) is concerned with assigning morpho-syntactic categories (tags) to words in a sentence, typically by employing a complex interaction of contextual and lexical clues to trigger the correct disambiguation. As a contextual clue, we might for instance assume that it is unlikely that a verb will follow an article. As a lexical (morphological) clue, we might assign a word like better the tag comparative if we notice that its suffix is er. POS tagging is a useful first step in text analysis, but also a prototypical benchmark task for the type of disambiguation problems which is paramount in natural language processing: assigning one of a set of possible labels to a linguistic object given different information sources derived from the linguistic context. Techniques working well in the area of POS tagging may also work well in a large range of other NLP problems such as word sense disambiguation and discourse segmentation, when reliable annotated corpora providing good predictive information sources for these problems become

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Routing Operation Based on Learning with Using Smart Local and Global Agents and with the Help of the Ant Colony Algorithm

Routing in computer networks has played a special role in recent years. The cause of this is the role of routing in a performance of the networks. The quality of service and security is one of the most important challenges in routing due to lack of reliable methods. Routers use routing algorithms to find the best route to a particular destination. When talking about the best path, we consider p...

متن کامل

Improvement of Routing Operation Based on Learning with Using Smart Local and Global Agents and with the Help of the Ant Colony Algorithm

Routing in computer networks has played a special role in recent years. The cause of this is the role of routing in a performance of the networks. The quality of service and security is one of the most important challenges in routing due to lack of reliable methods. Routers use routing algorithms to find the best route to a particular destination. When talking about the best path, we consider p...

متن کامل

Research of Blind Signals Separation with Genetic Algorithm and Particle Swarm Optimization Based on Mutual Information

Blind source separation technique separates mixed signals blindly without any information on the mixing system. In this paper, we have used two evolutionary algorithms, namely, genetic algorithm and particle swarm optimization for blind source separation. In these techniques a novel fitness function that is based on the mutual information and high order statistics is proposed. In order to evalu...

متن کامل

Research of Blind Signals Separation with Genetic Algorithm and Particle Swarm Optimization Based on Mutual Information

Blind source separation technique separates mixed signals blindly without any information on the mixing system. In this paper, we have used two evolutionary algorithms, namely, genetic algorithm and particle swarm optimization for blind source separation. In these techniques a novel fitness function that is based on the mutual information and high order statistics is proposed. In order to evalu...

متن کامل

Optimization of e-Learning Model Using Fuzzy Genetic Algorithm

E-learning model is examined of three major dimensions. And each dimension has a range of indicators that is effective in optimization and modeling, in many optimization problems in the modeling, target function or constraints may change over time that as a result optimization of these problems can also be changed. If any of these undetermined events be considered in the optimization process, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000